CLAIMS 



What is claimed is: 



1 L A method for matrix transposition, the method comprising: 

2 rotating in a vector register a first row of a matrix to generate a first row of 

3 elements; 

4 writing simultaneously into a plurality of look up units the first row of 

5 elements indexed by a first row of indices in a vector register; 

6 looking up simultaneously from the plurality of look up units a second row 

7 of elements indexed by a second row of indices in a vector register; 

8 and 

9 rotating in a vector register the second row of elements to generate a third 
10 row of elements, 

1 2. A method as in claim 1 wherein each element of the matrix comprises a 

2 plurality of bit segments, each of which is written into an entry of a different 

3 unit of the plurality of look up units. 

1 3. A method as in claim 1 wherein the plurality of look up units are configured 

2 into a plurality of look up tables in response to receiving an instruction for 

3 looking up a row of elements. 
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14. A method as in claim 1 further comprising: 

2 concurrently rotating in a vector register a second row of matrix to generate a 

3 forth row of elements while writing the first row of elements. 

1 5. A method as in claim 4 wherein a row that needs no rotation is written into 

2 look up units before other rows are written into the look up units. 

16. A method as in claim 4 further comprising: 

2 concurrently computing a third row of indices using the first row of indices 

3 while writing the first row of elements. 

17. A method as in claim 6 further comprising: 

2 concurrently loading a row of the matrix from memory into a vector register 

3 while writing the first row of elements. 

1 8. A method as in claim 6 wherein: 

2 the first row of indices are a first constant; 

3 the third row of indices are a second constant; and 

4 the first and second constants differ by one. 

1 9. A method as in claim 6 wherein the third row of indices is a result of a 

2 rotation of the first row of indices. 
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10. A method as in claim 1 further comprising: 

concurrently rotating in a vector register a fifth row of elements to generate a 
forth row of elements while looking up the second row of elements. 

11. A method as in claim 1 0 wherein a row of elements that needs no rotation is 
looked up from the plurality of look up units after other rows are looked up 
from the plurality of look up units. 

12. A machine readable media containing executable computer program 
instructions which when executed by a digital processing system cause said 
system to perform a method for matrix transposition, the method comprising: 
rotating in a vector register a first row of a matrix to generate a first row of 

elements; 

writing simultaneously into a plurality of look up units the first row of 
elements indexed by a first row of indices in a vector register; 

looking up simultaneously from the plurality of look up units a second row 
of elements indexed by a second row of indices in a vector register; 
and 

rotating in a vector register the second row of elements to generate a third 
row of elements. 

13. A media as in claim 12 wherein each element of the matrix comprises a 
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plurality of bit segments, each of which is written into an entry of a different 
unit of the plurality of look up units. 

14. A media as in claim 12 wherein the plurality of look up units are configured 
into a plurality of look up tables in response to receiving an instruction for 
looking up a row of elements. 

15. A media as in claim 12 wherein the method further comprises: 
concurrently rotating in a vector register a second row of matrix to generate a 

forth row of elements while writing the first row of elements. 

16. A media as in claim 15 wherein a row that needs no rotation is written into 
look up units before other rows are written into the look up units. 

17. A media as in claim 1 5 wherein the method further comprises: 
concurrently computing a third row of indices using the first row of indices 

while writing the first row of elements. 

18. A media as in claim 1 7 wherein the method further comprises: 
concurrently loading a row of the matrix from memory into a vector register 

while writing the first row of elements. 

19. A media as in claim 1 7 wherein: 
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the first row of indices are a first constant; 

the third row of indices are a second constant; and 

the first and second constants differ by one. 

20. A media as in claim 1 7 wherein the third row of indices is a result of a 
rotation of the first row of indices. 

21 . A media as in claim 12 wherein the method further comprises: 
concurrently rotating in a vector register a fifth row of elements to generate a 

forth row of elements while looking up the second row of elements. 

22. A media as in claim 21 wherein a row of elements that needs no rotation is 
looked up from the plurality of look up units after other rows are looked up 
from the plurality of look up units. 

23. A processing system for matrix transposition, the system comprising: 
means for rotating in a vector register a first row of a matrix to generate a 

first row of elements; 
means for writing simultaneously into a plurality of look up units the first 

row of elements indexed by a first row of indices in a vector register; 
means for looking up simultaneously from the plurality of look up units a 

second row of elements indexed by a second row of indices in a 

vector register; and 
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means for rotating in a vector register the second row of elements to generate 
a third row of elements. 

24. A processing system as in claim 23 wherein each element of the matrix 
comprises a plurality of bit segments, each of which is written into an entry 
of a different unit of the plurality of look up units. 

25 . A processing system as in claim 23 wherein the plurality of look up units are 
configured into a plurality of look up tables in response to receiving an 
instruction for looking up a row of elements. 

26. A processing system as in claim 23 further comprising: 

means for concurrently rotating in a vector register a second row of matrix to 
generate a forth row of elements while writing the first row of 
elements. 

27. A processing system as in claim 26 wherein a row that needs no rotation is 
written into look up units before other rows are written into the look up units. 

28. A processing system as in claim 26 further comprising: 

means for concurrently computing a third row of indices using the first row 
of indices while writing the first row of elements. 
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29. 



A processing system as in claim 28 further comprising: 
means for concurrently loading a row of the matrix from memory into a 
vector register while writing the first row of elements. 



1 30. A processing system as in claim 28 wherein: 

2 the first row of indices are a first constant; 

3 the third row of indices are a second constant; and 

4 the first and second constants differ by one. 

1 31. A processing system as in claim 28 wherein the third row of indices is a 

2 result of a rotation of the first row of indices. 

1 32. A processing system as in claim 23 further comprising: 

2 means for concurrently rotating in a vector register a fifth row of elements to 

3 generate a forth row of elements while looking up the second row of 

4 elements. 

1 33. A processing system as in claim 32 wherein a row of elements that needs no 

2 rotation is looked up from the plurality of look up units after other rows are 

3 looked up from the plurality of look up units. 

1 34. A processing system for matrix transposition, the system comprising: 
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2 a vector register file comprising a plurality of vector registers; 

3 a vector processing unit coupled to the vector register file, the vector 

4 processing unit comprising a vector look up unit, the vector look up 

5 unit comprising a plurality of look up units adapted to look up a 

6 vector of data items simultaneously, the vector processing unit: 

7 rotating in a vector register in the vector register file a first row of a matrix to 

8 generate a first row of elements; 

9 writing simultaneously into the plurality of look up units the first row of 

1 0 elements indexed by a first row of indices in a vector register in the 

1 1 register file; 

1 2 looking up simultaneously from the plurality of look up units a second row 

1 3 of elements indexed by a second row of indices in a vector register in 

14 the register file; and 

1 5 rotating in a vector register in the vector register file the second row of 

1 6 elements to generate a third row of elements. 

1 35. A processing system as in claim 34 wherein each element of the matrix 

2 comprises a plurality of bit segments, each of which is written into an entry 

3 of a different unit of the plurality of look up units. 

1 36. A processing system as in claim 34 wherein the plurality of look up units are 

2 configured into a plurality of look up tables in response to receiving an 

3 instruction for looking up a row of elements. 
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37. 



A processing system as in claim 34 wherein the vector processing unit 
concurrently rotates in a vector register a second row of matrix to generate a 
forth row of elements while writing the first row of elements. 



1 38. A processing system as in claim 37 wherein a row that needs no rotation is 

2 written into look up units before other rows are written into the look up units. 

1 39. A processing system as in claim 37 wherein the vector processing unit 

2 concurrently computes a third row of indices using the first row of indices 

3 while writing the first row of elements. 

1 40. A processing system as in claim 39 wherein the vector processing unit 

2 concurrently loads a row of the matrix from memory into a vector register 

3 while writing the first row of elements. 

1 41. A processing system as in claim 39 wherein: 

2 the first row of indices are a first constant; 

3 the third row of indices are a second constant; and 

4 the first and second constants differ by one. 

1 42. A processing system as in claim 39 wherein the third row of indices is a 

2 result of a rotation of the first row of indices. 
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43. A processing system as in claim 34 wherein the vector processing unit 

concurrently rotates in a vector register a fifth row of elements to generate a 
forth row of elements while looking up the second row of elements. 



44. A processing system as in claim 43 wherein a row of elements that needs no 
rotation is looked up from the plurality of look up units after other rows are 
looked up from the plurality of look up units. 
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